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ABSTRACT 


A  study  was  conducted  to  extend  a  descriptive  structure  for 
measuring  human  performance  during  training  to  a  fixed-wing, 
high-performance  aircraft  simulation,  and  to  develop  measure 
selection  statistical  techniques.  The  effort  required:  (1) 
definition  of  candidate  performance  measures  for  the  simulated 
flight  task,  (2)  development  of  computer  programs  to  acquire  raw 
data  and  produce  candidate  measures  for  18,  one-hour  training 
sessions  with  four  participants,  and  (3)  most  especially,  to 
develop  methods  to  reduce  the  resulting  candidate  measures  to  a 
small  and  efficient  set  which  reflects  the  skill  change  that 
occurs  as  a  function  of  training. 

I  -  was  desired  that  the  resultant  measurement  have  the 
capability  of:  (1)  discriminating  between  different  levels  of 
proficiency  and  (2)  predicting  later  performance  based  on 
measures  of  current  performance.  Therefore,  two  measure 
selection  methods  were  developed.  One  was  based  in  part  on  a 
multiple  discriminant  analysis  model.  The  second  was  based 
in  part  on  a  canonical  correlation  model. 

The  multiple  discriminant  procedure  was  able  to  reduce 
measures  to  an  efficient  sat  which  could  discriminate  between 
early  and  later  training  performance,  and  produced  weights  for 
the  summation  of  individual  measures  into  one  composite  score. 
Minor  improvements  in  the  method  were  suggested. 


The  canonical  correlation  procedure  to  choose  measures 
which  predict  later  performance  worked  also,  but  the  data 
revealed  the  need  for  additional  criteria  in  the  selection  of 
predictive  measures.  More  comprehensive  algorithms  were 
suggested. 


It  was  concluded  that  additional  data  should  now  be 
collected  to  verify  the  results  with  a  large  number  of 
participants.  Real-time,  or  near-real-time  production  of 
measures  while  training  is  in  progress  should  be  attempted  in 
an  automated  flight  trainer. 
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A  study  was  conducted  to  extend  a  descriptive  structure  for  measuring  human  per¬ 
formance  during  training  to  a  fixed-wing,  high-performance  aircraft  simulation,  and  to 
develop  measure  selection  statistical  techniques.  The  effort  required  (1)  definition 
of  candidate  performance  measures  for  the  simulated  flight  task,  (2)  development  of 
computer  programs  to  acquire  raw  data  and  produce  candidate  measures  for  18,  one-hour 
training  sessions  with  four  participants,  and  (3)  most  especially,  to  develop  methods 
to  reduce  the  resulting  candidate  measures  to  a  small  and  efficient  set  which  reflects 
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FOREWORD 


This  report  documents  the  current  status  of  ongoing  man-machine 
training  performance  measurement  method  development.  A  previous  report 
(Vreuls,  Obermayer,  Lauber,  and  Goldstein,  1973)  emphasized  the  development 
of  a  descriptive  structure  for  obtaining  measurement  in  a  man-machine  train¬ 
ing  situation.  This  report  emphasizes  the  development  and  current  status  of 
measurement  selection  techniques  based  on  multivariate  analyses,  which  were 
explored  as  a  means  of  selecting  measures,  rather  than  the  more  traditional 
use  as'  a  means  of  personnel  selection  and  classification,  further  work  on 
measure  selection  techniques  is  necessary,  and  is  ongoing  under  the  direc¬ 
tion  of  NAVTRAEQUIPCEN  and  the  sponsorship  of  the  Advanced  Research  Projects 

Agency. 

The  views  and  conclusions  contained  in  this  document  are  those  of  the 
authors  and  should  not  be  interpreted  as  necessarily  representing  the 
official  policies,  either  expressed  or  implied,  of  the  Advanced  Research 
Projects  Agency  or  the  United  States  Government. 

_  f 
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SECTION  I 

INTRODUCTION  AND  SUMMARY 

Performance  measurement  produces  information  needed  for  a 
specific  purpose,  such  as  the  evaluation  of  trainee  performance 
01  the  conduct  of  training.  Performance  measurement  is  therefore 
vital  to  improved  training  or  improved  evaluation.  Typically, 
military  man-machine  system  performance  measurement  involved 
the  processing  of  large  quantities  of  continuously  varying 
information;  consequently,  such  measurement  is  beyond  the 
capabilities  of  manual  processing  and  simple  measurement  devices, 
and  thus  must  be  automated. 

Automation,  however,  places  severe  demands  on  exact 
definition  of  the  conditions  during  which  measurement  takes 
place  and  a  succinct  definition  of  measures  which  have  utility. 
The  definition  of  useful  measures  itself  has  been  a  major 
technical  challenge  (e.g.,  Smode,  1971;  Vreuls  and  Obermayer, 
1971b).  Where  performance  measurement  has  been  used,  it  has 
been  selected,  commonly,  on  the  basis  of  "common  practice"  or  on 
the  basis  of  an  analysis  of  the  skills,  knowledges,  task 
components  and/or  mission  objectives.  Several  studies  (cf., 
Vreuls  and  Obermayer,  1971a;  Vreuls,  et  al.,  1973;  Knoop  and 
Welde,  1973)  have  emphasized  that  analytic  methods  alone  fail  to 
satisfactorily  define  measurement. 


Measurement  defined  only  on  the  basis  of  common  practice  or 
analysis  is  likely  to  be  overabundant,  unwieldy  and  perhaps 
impossible  to  implement  in  an  operational  setting.  The  large 
quantities  of  information  thus  produced  are  ’.ikely  to  include 
(1)  different  ways  to  measure  the  same  behavior  and  (2)  measures 
of  behavior  and  system  performance  which  may  prove  to  be 
unimportant.  Although  the  measurement  development  process  must 
start  with  a  good  analysis,  it  is  necessary  to  seek  empirical 
methods  to  reduce  measurement  to  a  small,  efficient  set. 

The  reduction  of  initially  defined  measures  into  a  set 
which  can  be  shown,  mathematically,  to  have  the  desired 
properties  is  called  the  measure  selection  process.  Previous 
research  by  the  authors  established  and  tested  a  descriptive 
structure  for  obtaining  measurement  in  a  man-machine  training 
situation.  The  primary  emphasis  of  the  work  reported  herein 
was  the  design  and  development  of  measure  selection  techniques 
which  were  based  on  multivariate  statistical  models  which 
consider  the  total  set  of  measures,  rather  than  consideration 
of  individual  measures  without  reqard  to  what  is  happening  to 
other  measures  at  the  same  time 
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SUMMARY  OF  METHOD 

An  empirical  method  was  used  to  develop  the  measure 
selection  techniques.  Data  were  collected  while  human 
participants  underwent  18,  one-hour  training  sessions.  Raw  data 
were  converted  to  analytically  define  performance  measures 
through  the  use  of  measure  producing  programs  which  read  the 
data  tapes  at  the  conclusion  of  training.  The  performance 
measures  were  then  used  as  a  data  base  for  the  development  of 
multivariate  statistical  selection  techniques. 

Measure  selection  development  was  oriented  to  the  use  of 
measurement  within  automated,  adaptive  flight  training  systems. 

It  was  desired  that  the  resultant  measurement  have  the  capability 

(1)  of  discriminating  between  different  levels  of  proficiency  and 

(2)  to  predict  later  performance  based  on  measures  of  current 
performance. 

RESULTS 

As  it  was  defined  herein,  the  discriminant  procedure  worked 
for  measure  selection.  In  one  of  the  test  cases,  24  initial 
measures  were  reduced  to  seven  which  could  discriminate  between 
early  and  later  training  performance.  The  procedure  also 
produced  the  weights  for  summing  the  measures  into  one  composite 
score.  Minor  improvements  were  recommended. 

The  canonical  cor relation  procedure  to  choose  measures 
which  predict  later  performance  worked  also;  however,  the  data 
revealed  the  need  for  more  complex  criteria  in  the  selection  of 
predictive  measures.  More  comprehensive  algorithms  were 
suggested. 

IMPLICATIONS  FOR  FURTHER  RESEARCH 

It  is  possible  to  mathematically  define  an  efficient  set  of 
measures  which  can  significantly  change  during  training  of 
psychomotor  skills  for  flight  control.  Thus,  the  discriminant 
technique  should  be  applied  to  automated  training,  and  flight 
training  where  instrumentation  and  support  subsystems  are  avail¬ 
able.  Minor  method  improvement  should  be  undertaken  to  fine-tune 
the  discriminant  procedure,  as  suggested  herein. 

Further  design  and  testing  of  the  predictive  measure 
selection  method  is  needed.  It  was  felt  that  with  additional 
data  and  with  suggested  program  changes,  that  the  next  iteration 
with  the  predictive  procedui e  should  solve  many  of  the  presently 
encountered  problems.  However,  the  problems  of  assessing  proper 
criteria  for  performance  prediction  should  wait  for  data  collec¬ 
tion  in  training  programs  with  a  broader  scope  than  considered  in 
this  study. 
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SECTION  II 

A  METHOD  FOR  MEASUREMENT  DEFINITION 


ANALYSIS  FOR  MEASUREMENT 


Detailed  analyses  of  missions  and  human  operator  tasks  are 
conventional  ways  to  provide  foundation  information  for  the 
study  of  man-machine  problems.  These  analyses  provide  a  concise 
description  of  the  various  separate  parts  of  the  mission,  that 
which  is  to  be  achieved  during  the  mission,  the  various  sequen¬ 
tial  and  parallel  activities  taking  place,  the  specific  human 
operator  tasks,  and  criteria  for  the  performance  of  human 
operator  functions  and  the  accomplishment  of  the  mission.  For 
the  purposes  of  measurement  definition  it  is  desirable  to  achieve 
an  operational  description  of  tue  mission  and  tasks,  that  is,  a 
definition  of  the  overt  clearly  identifiable  operations  taking 
place  which  are  directly  or  indirectly  affected  by  the  human 
operator. 


Analysis  for  comprehensive  measurement  begins  with  a 
complete  decomposition  of  the  mission  into  smaller  parts  for 
which  activities  and  criteria  are  more  easily  defined.  For 
example,  the  mission  may  be  decomposed  into  separate  maneuvers 
showing  the  normal  and  alternative  sequences  of  maneuvers.  The 
maneuvers  may  then  be  further  divided  into  segments.  Through 
this  procedure  the  mission  is  divided  into  many  parts  and  the 
total  measurement  problem  is  correspondingly  divided  into  smaller 
problems . 

One  of  the  most  difficult  aspects  of  automated  measurement, 
in  practice,  is  to  clearly  identify  these  parts  so  that  a 
computer  can  be  programmed  to  recognize  a  segment  or  maneuver  so 
that  the  appropriate  measurement  can  be  taken.  One  must  be  able 
to  operationally  define  without  equivocation  when  a  segment 
starts,  so  that  appropriate  measurement  calculations  can  begin, 
and  when  the  segment  ends,  so  that  measurement  stops.  This  is 
termed  start/stop  ioqic  in  this  report  (e.g.,  if  specified 
conditions  are  met,  then  start  measuring,  and,  if  other 
conditions  are  met,  then,  stop  measuring). 

Within  each  segment,  measurement  is  conceivably  possible 
at  a  minimum  of  two  levels:  (1)  measurement  of  the  total 
man-machine  system  for  comparisi  n  to  expected  mission  goals,  and 
(2)  measurement  of  human  operator  activity  in  relation  to  design 
expectations.  It  is  also  possible  to  increase  the  number  of 
hierarchical  levels  for  measurement  by  also  measuring  the 
performance  of  the  various  subsystems  including  the  human 
operator . 

At  any  hierarchical  level  of  measurement,  the  measures  may 
be  defined  in  terms  of  the  system  state  variables,  that  is,  those 
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parameters  (o.g.,  altitude,  airspeed,  angle  of  attack)  which  are 
totally  sufficient  lor  the  description  of  system  behavior.  In 
fact,  with  system  equations  defined  in  terms  of  the  state 
variables,  one  should  be  capable  of  the  prediction  of  future 
system  states.  '"he  remaining  primary  task  of  measurement  is  the 
definition  of  calculations,  or  transformations,  to  produce 
measures  (or  metrics)  in  terms  of  system  parameters  during  the 
intervals  defined  by  ;■  (art  /.•:  /.e/>  logic. 

An  analysis  for  measurement  will  reflect  all  activities 
occurring  during  a  mission  which  may  affect  mission  success. 
Unless  one  can  somehow  remove  portions  of  the  mission  from 
consideration,  a  set  of  measures  will  be  produced  which  will 
attempt  to  reflect  everything  going  on.  In  effect,  we  implement 
the  policy,  "If  it  moves,  measure  it."  To  be  practical,  we 
should  attempt  to  be  efficient,  and  certainly  should  remove  all 
irrelevant  measurement.  Analyses  conducted  for  measurement 
should  strive  to  simplify  and  remove  irrelevant  measurement. 

This  will  probably  be  accompli  shed  only  to  the  extent  that 

(1)  the  analyst  fully  understands  the  tasks  of  the  human 
operator  and  their  relationships  to  system  performance,  and 

(2)  research  has  sufficiently  examined  similar  cases  and 
alternative  forms  of  measurement.  Since  these  conditions  are 
seldom  met,  the  analyst  is  likely  to  be  conservative  and  create 
an  excessively  large  set  of  measures. 

A  STRUCTURE  FOR  MEASURE  DEFINITION 

Analyses  suggest  that  most  maneuvers  can  be  thought  of  as 
collections  of  different  segments  for  measurement  purposes. 

A  segment  is  any  portion  of  a  maneuver  in  which  the  desired 
behavior  of  a  trainee  or  resulting  system  performance  is 
relatively  constant  or  fol lows  a  lawful  relationship  from 
beginning  to  end.  Just  as  a  primary  task  may  continue  while 
two  subtasks  procbed  sequentially,  measurement  segments  may 
overlap.  Also,  segments  may  repeat  within  a  maneuver. 
Measurement  sets  within  similar  segments  of  any  maneuver  should 
be  equivalent,  although  the  desii.ed  value  of  some  parameters 
might  change. 

The  beginning  and  end  of  a  measurement  segment  should  be 
defined  as  a  logical  consequence  of  Boolean  and  relational 
expressions.  Several  relational  expressions  may  be  required  to 
remove  ambiguity.  For  example,  (me  might  define  helicopter 
lift-off  when: 

{(altitude  exceeds  its  initial  value  by  more  than 
one  foot) 

.OR. 

(altitude  rate  exceeds  50-teeL  per  minute) 
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.AND. 

(collective  control  is  greater  than  20-degroes) 


.AND. 

(torque  is  greater  than  50-percent)} 


\ 


Specific  Start/Stop  functions  and  logical  operators  for 
combining  these  functions,  used  in  the  current  study,  are 
listed  in  table  1  and  2,  respectively. 


i 

TABLE  1.  GLOSSARY  OF  START/STOP  FUNCTIONS 


MNEMONIC  FUNCTION 

START/STOP  WHEN: 

B 

Beginning  of  Record 

E 

End  of  Record 

P 

End,  Best  Tit  Power  of  2 

G 

PAR>DSR 

Parameter  Greater  than  Desired  Value 

L 

PAR<DSR 

Parameter  Less  than  Desired  Value 

0 

| PAR-DSR| >tol 

Absolute  value  of  parameter  minus 
desired  value  is  greater  than  (outside 
of)  tolerance 

I 

| PAR-DSR|  <tol 

Absolute  value  of  parameter  minus  desired 
value  is  leer  than  (inside)  tolerance 

CO 

| PAR-INIT|  >  TOL 

Absolul  e  value  of  parameter  minu^  its 
initial  value  is  greater  than  tolerance 
(or  the  change  from  initial  is  outside 
of  tolerance) 

Cl 

|  PAR-INIT|  <  TOL 

Absolute  value  of  parameter  minus  its 
initial  value  is  less  than  the  tolerance 

These  functional  expressions  were  sufficient  for  the  current 
development;  they  could  be  expanded  as  necessary. 


NAVTRAEQUIPCEN  73-C-0066-1 

TABLE  2.  GLOSSARY  OF  LOGICAL  OPERATORS  FOR 
COMBINING  START/STOP  FUNCTIONS1 


MNEMONIC 

EACH  PAIR 

OF 

FUNCTIONS 

(f; 

IS 

EVALUATED  TRUE  IF: 

A 

F, 

is  True 

and 

f2 

i  s 

True 

0 

F  i 

is  True 

or 

f2 

is 

True 

N 

F  i 

is  True 

and 

f2 

is 

False 

R 

F  i 

is  False 

and 

f2 

is 

False 

1  These  logical  operations  were  sufficient  for  the  current 
development;  obviously,  they  could  be  expanded  as  necessary. 


Thus,  four  observations  for  defining  maneuver  segmentation 
evolve  trom  measurement  analyses.  First,  maneuvers  can  be 
partitioned  into  any  number  of  segments  in  which  the  determinants 
of  performance  can  be  mathematically  defined  and  for  which  the 
conditions  for  starting  and  stopping  measurement  can  be 
unambiguously  defined.  Secondly,  within  any  maneuver  an 
identical  segment  may  repeat,  thirdly,  different  maneuvers  may 
contain  identical  segments.  Fourthly,  segments  for  measurement 
purposes  may  overlap. 

Having  defined  the  conditions  for  measurement,  a  measure  set 
can  be  constructed  to  represent  all  the  trainee  performance 
information  which  is  desired  for  that  segment.  The  set  can 
contain  an  unlimited  number  of  performance  measures,  each 
specified  in  terms  of  a  paramet  r ,  a  sampling  rate ,  a  desired 
value  (if  appropriate),  a  transformation  and  a  tolerance  if  the 
transform  requires  one.  A  parameter  is  defined  as  a  measure  of 
(a)  vehicle  states  in  any  internal  or  external  reference  plane 
such  as  pitch  or  roll  attitude,  (b)  personnel  physiological  or 
positional  states  such  as  heart  rate  or  eye  movement,  (c)  control 
device  states  such  as  stick  position,  or  (d)  discrete  events  such 
as  switch  positions.  The  sampling  rate  is  the  frequency  at  which 
the  parameter  is  sampled.  Sometimes  the  value  of  the  parameter 
means  nothing  unless  it  is  compared  to  a  desired  value  to  derive 
an  error  score.  Finally,  a  transformation  is  the  mathematical 
treatment  of  the  parameter  such  as  a  scalar  value,  a  mean,  a 
variance,  a  Fourier  transform,  ttc.  Common  transforms  used  in 
manned  vehicle  research  are  sheen  in  table  3.  Specific  trans¬ 
formations  used  in  this  study  are  presented  in  table  4. 
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An  adequate  description  of  multidimensional  human 
operator  performance  will  require  many  measures.  Each  measure 
of  the  set  must  be  defined  in  terms  of  all  of  the  following 
determinants : 

a.  Maneuver  / 

b.  Seqment  (when  measurement  starts  and  stops) 

c.  Parameter 

d.  Sampling  Rate 

e.  Desired  Value  (if  required) 

f.  Tolerance  Value  (if  required) 

g.  Transformation. 

TABLE  3.  COMMON  MEASURE  TRANSFORMATIONS 


TIME  HISTORY  MEASURES 

Time  on  Target 

Time  Out  of  Tolerance 

Maximum  Value  Out  of  Tolerance 

Response  Time.  Rise  Time,  Overshoot 

Frequency  Domain  Approximations 

Count  of  Tolerance  Band  Crossings 
Zero  or  Average  Value  Crossings 
Deriva cive  Sign  Reversals 
Damping  Ratio 

AMPLITUDE-DISTRIBUTION  MEASURES 
Mean,  Median,  Mode 

Standard  Deviation,  Variance,  Quartile  Range 
Minimum/Maximum  Value 

Root-Mean-Squared  Error,  Mean-Squared  Error 
Absolute  Average  Error 

FREQUENCY  DOMAIN  MEASURES 

Autocorrelation  Function 
Power  Spectral  Density  Function 
Bandwidth 
Peak  Power 

Low/High  Frequency  Power 
Bode  Plots,  Fourier  Coefficients 
Amplitude  Ratio 
Phase  Shift 

Transfer  Function  Model  Parameters 
Quasi-Linear  Describing  Function 
Cross-Over  Mode] 


\ 


i 


MNEMONIC 

TRANSFORMATION 

i 

• 

INIT 

Initial  Scalar  \-aiue 

FINL 

Final  Scalar  Value 

AINI 

Absolute  Initial  Scalar  Valae 

AFIN 

Absolute  Final  Scalar  Value 

MIN 

Minimum  Value 

MAX 

Maximum  Value 

AVG 

Average  Value  1  1  x 

IT  n 

AAE 

n 

Average  Absolute  Value  1  j-  |x| 

N  i  1  1 

EE'S 

Error  Squared  Value  1  y  x2 

N  i 

n  ,  n 

Variance  E  x2-  _L  (Ex)2 

VAR 

l  N  l 

n  ** 

RMS 

Root-Mean-Square  1  ( E  x  ) 

N  1 

% 

SDV 

ri 

n  2 

Standard  Deviation  1  (e  x?-  ^ 

(EX) 2) 

N=T  1  N 

i 

TOT 

Time  Out  of  Tolerance  in  Seconds 

and  Tenths 

RNG 

Range,  Distance  Between  the  I.argest  and  Smallest 
value 

i 

ELT 

Elapsed  Time  in  Seconds  and  Tenth 

r 

ZRX 

No.  Zero  Crossings  per  Second 

AVX 

No.  Average  Crossings  per  Second 

AUTO 

Auto  Covariancr  Function 
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TABLE  4.  GLOSSARY  OF  TRANSFORMATIONS  (Cont) 


MNEMONIC 

PERD 

MLTR 


HARM 

FLTR 


T  RAN  S FO  RMAT ION 


Periodicity  of  Auto  Covariance  Function,  the 
tau  shift  values  and  covariance  at  peaks. 

MuLtiple  Regression  of  a  Parameter  x  and  its 
derivative  (x)  on  Parameter  y  (Cooley  and 
Lohnes,  1962).  This  particular  transform 
computes  successive  multiple  regressions  of 
x,  x  on  later  (tau)  values  of  y,  (as  in  an 
auto  covariance  function)  until  maximum 
multiple  regression  coefficient  is  found. 

It  returns  (1)  Tau  in  seconds,  (2)  the 
coefficient  of  multiple  regression  (3)  the 
Beta  weights  and  (4)  B-weights  at  the  point 
of  maximum  multiple  regression. 

Harmonic  Analysis  using  procedures  outlined 
Blackman  and  Tuke>  (1959)  ,  Cooley  and  Tukey 
(1965)  and  Villasenor  (1968)  produced  the 
power  spectral  density  function  for  the 
requested  bandwidth. 

Relative  power  between  2  and  6  Radians- 
per-second  using  a  pair  of  low-pass 
second-order  digital  filters  as  described 
by  Norman  (1973)  . 


A  representation  of  the  assumed  structure  for  measurement 
is  shown  in  figure  1.  As  can  be  seen,  it  is  hierarchical  rn 
nature.  Objective  performance  lor  any  trainee  on  any  tra  rng 
day  can  be  represented  by  a  collection  of  measures  for  each 
maneuver.  Each  maneuver  can  contain  any  number  of  segments. 
Identical  segments  may  repeat  within  a  maneuver.  Similar 
segments  may  appear  in  different  maneuvers.  Maneuver 
segmentation  defines  when  measurement  starts  and  stops. 

Within  a  segment  any  number  of  single  or  multiple  parame  er 
transformations  may  be  employed.  An  unlimited  number  of 
transformations  may  be  computed  on  any  parameter. 


ft 


Maneuver  1  j  Maneuver 
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Figure  1.  representation  of  Measurement  Model  Components 
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THE  CANDIDATE  MEASURE  SET 

The  measurement  produced  as  a  result  of  mission  and  task 
analysis,  defined  using  the  foregoing  measurement  structure,  will 
be  extensive  for  such  applications  as  flight  training.  If  the 
method  is  systematically  applied,  all  human  operator  activity  for 
which  the  analyst  suspected  a  relation  to  mission  performance 
will  be  measured.  The  selection  of  measurement  also  depends  on 
the  availability  of  knowledge  of  human  performance  and  system 
models  throughout  the  mission.  The  analyst  may  have  some 
difficulty,  therefore,  in  determining  whether  measurement  of 
some  segments  is  important,  and,  in  determining  which  of  several 
measurement  alternatives  are  appropriate  and  best. 

One  procedure  is  to  be  very  conservative:  Measure  if  there 
is  any  reasonable  doubt  whether  the  measure  can  be  excluded, 
and  implement  alternative  forms  of  measurement  if  a  clear-cut 
choice  cannot  be  made.  The  result  is  a  set  of  measures  which 
is  almost  certainly  redundant  and  too  large.  This  set  of 
measures  is  then  used  as  an  initial  candidate  set  from  which 
the  final  and  more  efficient  measure  set  may  be  selected.  Since 
presumably  analytic  resources  have  been  exhausted,  the  candidate 
measure  set  can  be  reduced  only  through  test  with  a  modest  number 
of  human  subjects  performing  tasks  identical  to,  or  related  to, 
those  involved  in  the  mission.  When  additional  data  are 
empirically  collected,  further  reduction  of  the  candidate  measure 
set  should  be  possible. 

A  small  set  of  measures  is  highly  desirable  from  a  number  of 
points  of  view.  Measurement  while  a  student  is  performing  is 
desirable  for  the  purposes  of  computerized  automated  training 
since  sufficient  computing  time  is  not  available  for  large 
quantities  of  measures.  Also,  measurement  may  be  intended  for 
use  with  airborne  instrumentation  for  which  the  capability  for 
measurement  is  very  restricted.  Finally,  large  quantities  and 
types  of  measurement  make  interpretation  of  results  quite  diffi¬ 
cult  whether  the  consumer  of  the  information  is  a  research 
scientist  or  an  instructor. 

MEASURE  SELECTION  CRITERIA.  Reduction  of  the  candidate  measure 
set  can  be  based  on  an  analysis  of  data  collected  through  a  trial 
application  of  the  measures,  but,  as  a  rather  laige  number  of 
meas  ires  is  typical,  and  a  number  of  subjects  and  trials  will  be 
required  f or  an  adequate  statistical  sample,  more  computer 
analysis  is  indicated.  ihe  criteria  for  selection  must  then  be 
defined  in  quantitative  operational  form  to  permit  machine 
selection.  When  the  criteria  are  clearly  stated,  the  type  of 
computer  programs  required  to  mechanize  the  selection  should  also 
be  apparent. 

But,  on  what  ground  should  x  specific  measure  be  excluded 
from  further  consideration?  After  consideration  of  the  needs  for 
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measurement  in  training  (Vreuls  and  Obermayer,  1971a,  1971b; 
Vreuls,  et  al.,  1973),  three  general  criteria  have  emerged: 

(1)  If  two  measures  provide  the  same  information  for  a 
given  application,  one  member  of  the  redundant  pair  may  be 
discarded. 

(2)  Measures  may  be  discarded  if  they  are  not  sensitive  to 
performance  differences  between  individuals.  The  measurement 
retained  should  be  able  to  discriminate  between  "good"  and  "bad" 
performers,  students  and  instructors,  and  performance  by  a 
student  early  in  training  compared  to  that  later  in  training. 

(3)  Measures  should  also  be  retained  if  they  lend  them¬ 
selves  to  early  prediction  of  performance  to  be  achieved  by  an 
individual;  for  example,  the  performance  level  to  be  achieved 
at  termination  of  training,  or,  the  prediction  of  deficiencies 
which  may  be  remedied  by  an  appropriate  change  of  training. 

If  two  measures  correlate  highly,  then  conceivably  one  of 
the  pair  may  be  removed  from  the  candidate  measure  set.  In 
fact,  it  may  be  quite  necessary  to  remove  such  measures ' for  the 
proper  functioning  of  multivariate  statistical  analyses  used 
for  testing  other  measurement  selection  criteria.  However,  the 
investigator  must  also  ensure  that  small  differences  between  two 
imperfectly  correlated  measures  are  r..:t  important;  for  example, 
one  subject  of  a  larger  group  may  be  sufficiently  different  that 
the  measures  are  definitely  uncorrelated  for  him.  Further,  of 
course,  the  problem  of  specifying  the  magnitude  of  the  correla¬ 
tion  coefficient  for  which  measures  will  be  considered  redundant 
measures  remains  to  the  judgment  of  the  investigator. 

A  multivariate  statistical  technique,  the  multiple 
discriminant  analysis,  is  available  to  derive  a  discriminant 
function  composed  of  a  weighted  sum  of  the  available  measures 
which  will  discriminate  best  between  two  or  more  groups.  The 
weighting  computed  indicates  the  relative  amounts  each  of  the 
measures  contributes  to  the  discrimination.  If  the  investigator 
can  establish  test  groups  which  are  known  to  be  different  in  ways 
which  are  of  interest  to  him,  arid  test  data  are  collected,  the 
multiple  discriminant  analysis  can  be  used  to  find  those  measures 
of  the  candidate  measure  set  which  facilitate  discrimination. 

The  weightings,  then,  are  the  key  to  the  definition  of  selection 
criterion:  the  criterion  can  be  that  the  measures  with  the 

least  weights  are  discarded.  Of  course,  the  threshold  level  for 
measure  weights  is  also  left  to  the  investigators  judgment. 

Another  multivariate  statistical  technique,  the  canonical 
correlation  analysis,  can  be  used  to  test  the  prediction 
qualities  of  a  measure  set.  Measures  are  found  through  this 
technique  which,  as  a  whole,  correlate  when  measured  at  one  time 
(e.g.,  early  in  training)  as  compared  to  the  same  measures  taken 
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dt  <_nother  time  (e.g.,  late  in  training).  Again,  weights  are 
associated  with  the  measures  which  may  be  used  to  define 
criterion  for  selection. 


The  specific  details  of  the  criteria  used,  and  the  mechanics 
of  the  selection  techniques,  arc  perhaps  best  presented  in  terms 
of  the  computer  operations  needed.  A  description  of  the 
computerized  selection  techniques  is  available  in  the  following 
chapter. 
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SECTION  III 

COMPUTERIZED  MEASURE  SELECTION  TECHNIQUES 
SELECTION  BY  DISCRIMINANT  ANALYSIS 

The  programs  generated  to  select  measurement  through 
discriminant  analyses  (Cooley  and  Lohnes,  1971)  assume  that  a 
battery  of  measures  have  been  taken  for  each  of  a  number  of 
groups  of  subjects.  The  primary  purpose  of  these  programs  is 
to  isolate  the  measures  which  best  discriminate  between  the 
groups.  For  example,  a  pair  of  groups  may  consist  of  experienced 
and  inexperienced  subjects,  respectively.  The  procedure  adopted 
discards  measurement  which  does  not  contribute  to  such 
discriminations . 

The  computer  programs  for  selection  based  on  discriminant 
analysis  (DISCRM  SELECT)  iteratively  discard  measures  until  a 
minimum  set  of  measures  results.  The  iterative  process  stops 
when  either  one  of  two  criteria  is  met:  (1)  the  total  number  of 
remaining  measures  is  less  than  the  minimum  number  of  factors 
determined  through  a  principal  components  analysis  (program 
PRINCO) ,  or  (2)  discarding  another  measure  will  reduce 
discrimination  to  an  unacceptable  level. 

The  above  procedure  is  satisfactory  unless  some  of  the 
measures  are  highly  correlated,  then  the  ability  of  the  measure 
set  to  discriminate  between  groups  cannot  be  clearly  attributed 
to  either  of  a  pair  cf  co*-’"elating  measures.  The  procedure 
adopted,  therefore,  first  performs  a  correlation  analysis,  and 
one  of  a  pair  of  measures  which  correlate  highly  will  be 
discarded.  The  right-hand  measures  of  a  pair  of  correlating 
measures  in  the  correlation  matrix  is  dropped  in  the  programs 
developed. 

FLOW  DIAGRAM.  The  flow  diagram  for  DISCRM  SELECT  is  presented 
in  figure  2.  The  output  produced  is  listed  in  table  5.  The 
version  shown  is  a  test  program  using  a  random  number  generator 
to  produce  data  with  known  characteristics.  Three  tolerances 
must  be  specified:  (TOL  1)  the  minimum  percent  of  the  original 

variance  to  be  accounted  for  by  any  measure  of  the  final  reduced 
set  of  measures,  (TOL  2)  the  minimum  proportion  of  the  variance 
of  a  specific  measure  extracted  by  all  discriminant  functions, 
and  (TOL  3)  the  maximum  correlation  permitted  between  measures. 

After  test  measures  are  generated,  and  tolerances  and 
measure  names  are  inputed,  «a  correlation  analysis  is  performed 
and  the  rignt-hand  member  of  a  pair  of  measures  is  discarded  if 
the  correlation  coefficient  exceeus  TOL  3.  A  new  list  of  measures 
is  then  printed  indicating  the  measures  which  have  been  retained 
or  discarded. 
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LEGEND:  T0L1: 

TOE?: 
TOL3 : 


Minimum  Percent  Variance  to  be  Accounted  for  by 
any  Measure  of  •  .be  Get 

Minimum  Discrimination  Communal ity 

Maximum  Measure  Lntercorrclation  Permitted 


Figure  2.  Flow  Diagram  of  D  .criminant  Selection  Process 
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Flow  Diagram  of  Discriminant  Selection  Process 
(cont i sued) 
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TABLE  5.  DISCRM  SELECT  OUTPUT 


Initial  Output 

Criteria  for  Selection 
Correlation  Matrix 
Measure  Set  Summary  (after  removing  Correlating  Measures) 

Principal  Components 
Correlation  Matrix 
Sphericity  Test 

Factors,  %  Trace,  DF,  CHI-SQUARE 
Factor  Pattern 

Communality  6  Multiple  R  by  Measure 
Factor  Score  Coefficients 

Rotations 

VARIMAX 

QUARTIMAX 

Multivariate  Analysis  of  Variance 

Means  £  Standard  Deviations  by  Group 
Test  of  Equality  of  Dispersions 
Univariate  F-Ratios 

Multivariate  Test  -  Wilks'  LAMBDA  £  F-Ratios 

Multiple  Discriminant  Analysis 

Multivariate  Test  -  Wilks'  LAMBDA  6  F-Ratios 
Chi-Square  with  Successive  Roots  Removed 
Row  Coefficients  Vectors 
Factor  Pattern 
Communali tics 

%  Trace  Accounted  for  by  each  Root 
Group  Centroids 

Measure  Set  Summary 

Measures  Kept  and  Dropped 
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A  principal  components  analysis  is  then  performed  on  the 
reduced  measure  set.  The  full  analysis  is  printed,  along  with 
VARIMAX  and  QUARTIMAX  rotations  of  the  factors.  The  variance  of 
the  factor  analysis  is  compared  to  TOL  2  to  determine  the 
minimum  number  of  factors  required,  and  hence  the  minimum  number 
of  measures  required  (MIN). 

Subsequently  a  Multivariate  Analysis  of  Variance  (MANOVA) 
and  a  Multiple  Discriminant  Analysis  (JISCRM)  is  performed  and 
all  results  are  printed.  The  commonality  associated  with  each 
measure  is  computed  and  printed;  this  is  the  proportion  of  the 
variance  associated  with  th_  specified  measure  which  is  extracted 
all  discriminant  functions.  The  minimum  communality  (CMIN)  is 
determined  and  the  measures  (NR)  associated  with  CMIN  is  noted. 

The  computation  will  now  stop  with  a  final  listing  of 
measures  kept  and  dropped  if  (1)  the  number  of  measures  (M)  is 
minimal  (M<M1N) ,  or  (2)  the  minimum  communality  is  greater  than 

2,-  i.e.,  discarding  another  measure  would  significantly 
reduce  the  ability  of  the  total  measure  set  to  discriminate 
between  the  experimental  groups. 

Otherwise,  the  computation  iterates  through  the  sequence 
again.  However,  the  measure  associated  with  CMIN  is  dropped, 
and  a  new  correlation  matrix  for  the  reduced  data  base  is 
computed. 

SELECTION  BY  CANONICAL  CORRELATION  ANALYSIS 

The  programs  called  DISCRM  SELECT  were  designed  to  aid  in 
the  selection  of  measures  which  are  capable  of  discriminating 
between  previously  designated  groups.  Another  series  of 
programs,  described  in  this  section,  were  designed  to  select 
measures  which  relate  per  formance  exhibited  at  one  time  in 
training  to  that  at  another  time.  The  basis  of  the  method  is 
a  canonical  correlation  analysis  (Cooley  and  Lohnes,  1971)  which 
derives  a  linear  combination  of  the  measures  and  maximizes  the 
correlation  between  the  linear  combination  of  one  set  of  measures 
in  relation  to  another  set  of  measures.  If  the  following  linear 
combinations  are  formed; 

y  i  =  aiX|+  a2x2+ . a  x 

n  n 

Y 2  =  b|Zi+  b2z2+  .  b  z 

n  n 

Where  x^  and  z^  are  the  same  measures  collected  at  different 
points  in  the  training  sequence,  then  canonical  correlation 
analysis  determines  the  coefficients  a.  and  b.  so  that  yi  and  y2 
maximally  correlate.  1  1 

The  quantities  yi  and  y2  are  factors  of  their  respective 
data  groups.  The  computer  programs  generate  the  factor  structure 
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for  each  set  of  data  which  displays  the  correlation  between 
each  measure  and  factor.  The  factor  which  correlates  between 
groups  best  is  also  indicated,  and  it  is  this  factor  which  is 
used  for  measure  selection.  The  measure  which  correlates  least 
with  this  factor  contributes  least  to  inter-group  correlation. 

It  is  this  measure  which  is  equaced  with  the  computer  parameter 
RMIN  in  the  CANON  SELECT  programs. 

FLOW  DIAGRAM.  CANON  SELECT  iteratively  reduces  the  measure  set 
until  the  entire  remaining  measures  contribute  sufficiently  to 
inter-group  correlation.  The  flow  diagram  in  figure  3 
corresponds  to  a  test  version  of  the  program  which  generates 
measures  artificially;  computer  output  categories  are  listed 
in  table  6. 

A  canonical  correlation  analysis  is  performed  (program 
CANON)  and  the  measure  with  minimum  weighting  (RMIN)  is  selected. 
If  this  measure  contributes  less  than  a  pre-specified  amount  to 
the  canonical  correlation  (RMIN<T0L1)  the  measure  is  dropped 
from  the  data  base  and  another  canonical  correlation  is 
performed.  These  steps  are  performed  iteratively,  with  a  new 
list  of  measures  printed  at  each  step,  until  the  minimum 
measure  redundancy  is  equal  to  or  greater  than  the  pre-specified 
tolerance. 
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Figure  3.  Final  Diagram  of  Canonical  Correlation 
Selection  Process  (continued) 
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TABLE  6.  CANON  SELECT  OUTPUT 


Measure  Set-  Summary  (Measures  Kept  &  Dropped) 
Correlation  Matrix 

Canonical  Weights  (Left  and  Right  Set) 

Factor  Structure  (Left  and  Right  Set) 

Variance  Extracted,  Redundancy  (Left  and  Right 
Set) 

Total  Variance,  Redundancy  (Left  and  Right  Set) 


Total  Set:  Wilks'  LAMBDA,  CHI  SQUARE,  Degrees 
of  Freedom 


8.  CHI  SQUARE  Tests  with  Successive  Roots  Removed 
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SECTION  IV 

DEVELOPMENT  OF  MFASURE  SELECTION  TECHNIQUES 


Measure  selection  techniques  were  developed  within  a 
computer-controlled  training  environment.  The  environment  was 
the  automated  instrument  flight  maneuvers  (IFM)  training  system 
developed  by  Johnson  (1972)  on  the  Trailing  Device  Computer 
System  (TRADEC)  located  at  the  Naval  Training  Equipment  Center. 
IFM  automatically  sequenced  the  trainee  chrough  a  series  of 
maneuvers  and  simulated  flight  conditions  as  a  function  measured 
trainee  performance  on  the  previous  and  antecedent  trials.  The 
performance  measures  (and  weighting  coefficients  for  summing  the 
various  components  of  error  into  one  composite  score)  were 
derived  during  IFM  system  design  from  task  analytic  data;  the 
measures  were  never  formally  tested. 

In  order  to  produce  data  for  empirical  measure  selection 
studies,  the  IFM  system  was  modified  to  control  a  measure 
selection  experiment  and  to  produce  raw  data  for  subsequent 
(non-real-time)  conversion  into  candidate  measures  and  further 
measure  selection  analyses. 

DATA  COLLECTION  METHOD 

A  data  base  for  preliminary  measure  selection  analyses  was 
created  by  conducting  a  study  in  which  trainees  flew  each  of 
four  principal  maneuvers  of  IFM  until  their  performance  was 
assumed  to  be  very  good  by  virtue  of  having  flown  the  simulator 
for  14-18  hours.  Measure  selection  methods  were  developed  using 
the  preliminary  data  base  so  produced. 

PARTICIPANTS.  Four  participants  were  used.  They  were  low-time 
private  pilots  who  were  unskilled  at  instrument  flight  at  the 
onset  of  data  collection.  A] 1  were  light  plane  pilots;  none 
were  familiar  with  jet  fighter  dynamic  response. 

APPARATUS.  The  test  equipment  was  the  TRADEC,  which  was 
configured  as  a  fixed-wing  aircraft  (F-4E) .  TRADEC  hardware 
included  an  XDS  Sigma-7  computer  and  associated  peripherals,  an 
aircraft  cockpit  mounted  on  top  of  a  four-degree-of-freedom 
motion  platform  (pitch,  roll,  yaw  and  heave),  and  a  host  of 
related  equipment.  A  digital  computer  program  provided  the  basi 
flight  simulation  (cf.,  Kapsis,  et  al.,  1969;  Erickson,  et  al., 
1969) .  The  basic  flight  program  was  converted  into  a  computer- 
controlled  training  device  by  the  automated  IFM  program. 

IFM  was  modified  from  an  automated  training  configuration 
to  an  automated  data  collection  configuration.  The  computer- 
controlled  speech  synthesizer  (COGNITRONICS )  was  used  to  brief 
participants  on  the  task  requirements  for  each  trial,  and  issue 
corrective  commentary  when  various  vehicle  states  were  out  of 
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tolerance.  The  task  scheduler  was  used  to  set  the  experimental 
conditions  for  the  next  trial  as  prescribed  by  the  experimental 
design . 

EXPERIMENTAL  DESIGN.  Each  of  the  four  participants  (see  table  7) 
were  trained  on  four  basic  instrument  flight  maneuvers  for  18, 
one-hour  sessions  over  a  period  of  seven  weeks.  The  four 
maneuvers  were  (1)  straight  and  level  flight,  (2)  standard  rate 
climbs  and  descents,  (3)  level  turns,  and  (4)  climbing  and 
descending  turns.  Six  trials  of  each  maneuver  were  flown  during 
each  training  session.  Each  successive  odd  and  even  numbered 
training  session  was  pooled  into  one  unit  called  a  training 
"day";  thus,  sessions  1  and  2  became  Day  1,  sessions  3  and  4 
became  Day  2,  etc.  This  pooling  resulted  in  48  observations 
(4  participants  by  6  trials  by  2  sessions)  for  each  maneuver  for 
each  day. 

Two  task  stressors  were  used,  turbulent  air  and  aircraft 
weight  and  center  of  gravity.  The  turbulent  air  was  generated 
in  the  flight  program  from  a  random  number  generator.  When 
used,  its  intensity  was  set  to  a  "light  turbulence"  level  as 
defined  by  the  TFM  program.  The  aircraft  weight  was  either 
light  or  he? ry.  The  light  aircraft  carried  2,500  pounds  of  fuel, 
had  a  gross  weight  of  33,600  pounds  and  a  center  of  gravity  at 
29.0  percent  mean  aerodynamic  chorl.  The  heavy  aircraft  carried 
12,896  pounds  of  fuel,  had  a  gross  weight  of  43,996  pounds  and  a 
center  of  gravity  at  30.2  percent  mean  aerodynamic  chord.  The 
weight  increases  and  aft  center  of  gravity  shift  reduced  the 
longitudinal  axis  short-period  camping  coefficient,  which 
decreased  the  simulator  pitch  axis  stability,  making  it  more 
difficult  to  control.  Task  stressors  were  not  changed  during  a 
trial . 


Each  participant  received  exactly  the  same  ord  r  of 
experimental  trials  each  day.  Thus,  maneuver  one  always  was 
flown  first  and  maneuver  four  always  was  flown  last.  This  fixed 
order  permitted  the  study  of  measures  for  each  maneuver  under 
identical  antecedent  conditions  (and  subsequent  order  effects) 
across  training  days. 

Performance  data  from  Days  .1,  3,  5  and  7  were  primary  units 
for  measure  selection  analyses.  It  was  assumed  that  after  14, 
one-hour  training  sessions  (the  conclusion  of  Day  7) ,  the 
participants  would  be  relatively  prof'eient  on  the  basic  maneu¬ 
vers.  Data  were  collected  during  Day  2,  4,  6,  8  and  9  for  further 
examination  of  the  effects  of  the  task  stressors  on  the  measure 
set  in  a  later  study  (bevond  the  scope  of  the  current  effort). 

MEASUREMENT.  Eighteen  (18)  pilot/system  performance  parameters 
shown  in  table  8  were  col lected  on  magnetic  tape  at  a  rate  of 
five  t imes-per-second  from  the  beginning  to  the  end  of  training. 
Only  the  raw  data  from  the  straight  and  level  maneuver  trials 
were  transformed  into  candidate  measure  sets  for  the  purpose  of 
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TABLE  8.  RAW  DATA  PARAMETERS 


PARAMETER 

UNITS 

ABBREVIATION 

1. 

SYSTEM  CLOCK  COUNT 

CLOK 

2. 

ELEVATOR  STICK  FORCE 

POUNDS 

ELVF 

3. 

ELEVATOR  STICK  DISPLACEMENT 

INCHES 

ELVS 

4. 

ANGLE  OF  ATTACK 

UNITS 

ALPH 

5. 

PITCH  ATTITUDE 

DEGREES 

PTCH 

6. 

CLIMB/DESCENT  RATE 

FEET  PER  MINUTE 

HDOT 

7. 

ALTITUDE 

FEET 

ALT 

8. 

RIGHT  THROTTLE  DISPLACEMENT 

DEGREES 

THRR 

9. 

AIRSPEED 

KNOTS 

A/S 

10. 

AILERON  STICK  FORCE 

POUNDS 

AILF 

11. 

AILERON  STICK  DISPLACEMENT 

INCHES 

AILS 

12. 

ROLL  ATTITUDE 

DEGREES 

ROLL 

13. 

TURN  RATE 

DEGREES  PER  SECOND 

TURN 

14. 

HEADING 

DEGREES 

HEAD 

15. 

RUDDER  PEDAL  FORCE 

POUNDS 

RUDF 

16. 

RUDDER  PEDAL  DISPLACEMENT 

INCHES 

PED 

SIDESLIP 

DEGREES 

BETA 

18.  TURBULENT  AIR  INTENSITY 


ARBITRARY  UNITS 


RUFF 
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preliminary  measurement  selection  method  development.  The 
transforms  available  in  the  measure  producinq  proqrams  are 
shown  in  table  4  in  Section  II.  Specific  maneuver  one-cand ida t e 
measures  are  shown  in  table  9. 


RESULTS 


Measure  selection  analyses  were  conducted  three  ways  by 
using  (1)  t-tests,  (2)  multiple  discriminant  analyses,  and 
(3)  canonical  correlation  analyses.  The  purpose  of  the  analyses 
was  to  develop  and  test  selection  methods.  Only  a  small  sample 
of  the  data  are  presented. 

t-TESTS.  The  t-tests  considered  each  measure  independent  of  all 
other  measures.  No  consideration  of  measure  correlation  was 
given.  As  a  result,  17-measures  were  found  to  be  significantly 
different  between  Day  1  and  Day  7,  as  shown  in  table  10. 

DISCRIMINANT  SELECTION.  The  24-candidate  measures  were  reduced 
to  seven  measures  which  could  significantly  discriminate  between 
Day  1  and  Day  7  performance  as  shown  in  table  11.  The  greatest 
reduction  in  the  candidate  measure  set  occurred  during  the 
initial  correlation  analysis.  All  measures  were  intercorrelated . 
The  right-hand  measure  of  a  pair  was  eliminated  if  the  correla¬ 
tion  exceeded  .69.  This  criteria  reduced  the  candidate  set  from 
24-measures  to  9-measures. 

The  discriminant  selection  procedure  further  reduced  the  set 
from  nine  to  seven  measures  shown  in  table  11,  based  on  two 
criteria,  (1)  any  measure  of  the  final  set  must  account  for  more 
than  seven  percent  of  the  total  variance,  and  (2)  the  minimum 
measure  communality  is  .200.  Communality  can  be  thought  of  as 
the  proportion  of  variance  associated  with  each  measure  which  is 
extracted  by  all  discriminant  functions.  The  discriminant 
vectors  shown  in  table  11  reflected  the  weighting  coefficients 
for  the  summation  of  measures  into  a  discriminant  function. 

The  composition  of  the  discriminating  set  was  of  interest. 
Three  measures  represented  outer -loop  vehicle  states--heading , 
altitude  and  airspeed.  Four  measures  represented  control  input 
states--elevator  stick  range  and  crossover  power,  aileron  stick 
crossover  power,  and  rudder  pedal  range.  Thus,  over  half  of  the 
measures  which  discriminated  between  early  and  late  performance 
were  control  input  measures. 

It  was  of  interest  also  to  examine  the  change  in  the 
descriptive  capability  of  the  resulting  measure  set.  The  factor 
loadings  from  the  principal  components  analysis  are  shown  in 
tab’Ve  12.  It  was  apparent  that  the  loadings  on  factor  I  were 
higher  on  Day  7  than  on  Day  1,  ind  that  the  amount  of  variance 
accounted  for  by  that  factor  was  21  percent  higher  on  Day  7. 
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TABLE  9.  CANDIDATE  MEASURES  FOR  MANEUVER  ONE  MEASURE  SELECTION  METHOD  DEVELOPMENT 
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TABLE  10.  AVERAGE  MANEUVER  ONE  MEASURES 


MEASURE 

ELRG 
ELF1 
ELF  2 
AIRG 
AIF1 
AIF2 
PDRG 
PDF1 
PDF  2 
ALRG 
ALSD 
PTRM 
PTSD 
PTRG 
RORM 
RORG 
PSRM 
PSRG 
HAA 
HRG 
HAAA 
HDRG 
ASAA 
ASRG 


DAY1 

1.707* 

.033* 

.687 

1.064* 

.056* 

.253 

.186* 

.014 

.149 

2.378* 

.446* 

2.611 

.837* 

3.766* 

2.616 

12.198 

2.982* 

3.868* 

.052* 

.173* 

.456* 

2.354* 

8.290* 

16.916* 


DAY  3 

1.184 
.019 
.713 
.787 
.025 
.274 
.119* 
.011 
.161 
1.  664 
.307 
2.606 
.495 
2.383 
2.324 
9.305 
2.219* 
2.934 
.028* 
.093* 
.242 
1.338 
4.114* 
10.818* 


DAYS 

1.059 

.018 

.700 

.692 

.021 

.256 

.066 

.010* 

.150 

1.491 

.269 

2.591 

.436 

2.101 

2.616 

10.066 

2.058* 

2.893 

.021 

.074 

.217 

1.151 

3.341 

8.444 


DAY  7 

1.086 

.017 

.708 

.641 

.021 

.269 

.079 

.015 

.151 

1.607 

.274 

2.591 

.414 

2.142 

2.331 

8.705 

1.733 

2.756 

.017 

.066 

.190 

1.125 

3.015 

8.492 


♦Measure  is  significantly  different  than  Day  7, 
P<.05  based  on  t-test;  48  observations  per 
number. 


In  general,  the  performance  dimensions  expressed  by  the 
factor  structures  appeared  to  be  more  integrated  on  Day  7  than 
on  Day  1.  Four  factors  accounted  for  88  percent  of  the  variance 
on  Day  7,  whereas,  five  factors  accounted  for  only  86  percent  of 
the  variance  on  Day  1.  Also  note  the  integration  of  Pedal  Range 
into  the  first  and  second  factors  by  Day  7. 
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TABLE  11.  MEASURES  SELECTED  BY  DISCRIMINANT  ANALYSIS* 


MEASURE 

P<** 

DISCRM 

VECTOR 

COMMUN- 

ALITY 

MEANS 

DAY 1  DAY 7 

ELRG 

.01 

0.142 

.  4088 

1.71 

1.09 

ELF1 

.01 

-1.758 

.  2409 

.03 

.02 

AIFl 

.01 

6.548 

.  3475 

.06 

.02 

PDRG 

.01 

2.  358 

.  3506 

.19 

.08 

PSRM 

.01 

-0.005 

.2496 

2.98 

1.73 

HAA 

.01 

9.067 

.4573 

.05 

.02 

ASRG 

.01 

0.052 

.5895 

16.92 

8.49 

*The  overall  discrimination  is  significant,  P<.01 
for  an  F-ratio  approximation  of  9.18  with  7/88  df. 


xhe  probability  that  the  differences  between  the 
means  were  due  to  chance  based  on  univariate 
F- ratios. 


TABLE  12.  FACTOR  LOADINGS  FOR  FINAL  MEASURES 


*Factor  loadings  loss  than  .30  are  generally  cc  isidered 
insignificant,  and  wore  omitted  from  the  table. 

**Per  cent  variance  accounted  fot  by  each  factor. 
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Further  rotation  of  the  factor  loading  (table  13)  suggested 
that  each  measure  of  the  final  set  essentially  represented  an 
independent  performance  dimension  on  Day  1.  As  training 
progressed  to  Day  7,  measure  tended  to  double-up  on  three 
factors;  this  ras  interpreted  to  indicate  an  increase  of 
control  is  tegration  and  coordination. 

CANONICAL  CORRELATION  SELECTION.  The  output  from  CANON  SELECT 
computer  programs  satisfied  the  initial  requirements  for 
selecting  predictive  measures.  For  the  sample  data  shown  in 
Tables  14,  15  and  16,  the  24-candidate  measures  were  reduced  to 
a  predictive  set  of  18-measures  The  criterion  for  measure 
rejection  was  a  correlation  of  less  than  .25  with  the  first 
canonical  factor.  This  criterion  was  set  low,  deliberately,  for 
the  initial  tests. 

To  recapitulate,  the  convention  was  used  in  the  procedure  to 
designate  the  predictor  measurement  as  the  "left"  side  (of  the 
intercorrelation  matrix)  and  the  criterion  as  the  "right"  side, 
although  the  canonical  correlation  model  was  completely  symmetri¬ 
cal.  Since  we  were  interested  in  the  possibilities  of  using  the 
measurement  for  prediction,  our  attention  was  focused  on  the 
left  side  of  measures.  The  following  output  was  produced: 

a.  Canonical  factors  —  the  coefficients  were  produced 
which  defined  factors  for  each  measure  set,  so  that 
the  factors  of  the  two  sets  have  the  highest 
correlation . 

b.  Factor  structure  —  the  correlation  of  each  measure 
with  each  canonical  factor. 

c.  The  proportion  of  shared  variance  (Rc2)  between  the 
corresponding  canonical  factors. 

d.  Redundancy  --  the  predict  of  the  proportion  of 
shared  variance  and  th  proportion  of  the  variance 
extracted  by  a  canonical  factor  (i.e.,  the  propor¬ 
tion  of  the  variance  of  one  set  accounted  for,  or 
"explained",  by  a  specific  canonical  factor  of  the 
other  set. 

e.  Bartlett's  test  for  significance  of  canonical 
correlation . 

A  sample  case  was  extracted  from  the  data  to  illustrate  the 
method.  The  data  shown  in  tables  14,  15  and  16  were  derived 
from  a  test  of  the  ability  of  pooled  Day  1  and  Day  3  data  to 
predict  pooled  Day  5  and  Day  7  data.  Since  the  data  were  pooled, 
each  measure  had  96  observation  ;  the  left  set  represented 
Day  1  and  Day  3  and  the  right  s  t  represented  Day  5  and  Day  7. 
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TABLE  13.  ROTATED  FACTOR  LOADINGS  FOR  FINAL  MEASURES* 


i 

FACTORS 

DAY  MEASURE 

I  II  :ti  iv 

V  VI 

VII 

DAY  1 

ELRG 

-.97 

ELFl 

.  96 

AIF 1 

.97 

PDRG 

.99 

PSRM 

.94 

HAA 

.  95 

ASR3 

.98 

DAY  7 

ELRG 

-.86 

ELFl 

.  94 

AIF  1 

.  94 

PDRG 

.35  .38 

- .  76 

PSRM 

.92 

HAA 

.  35 

-.86 

ASRG 

.92 

♦Factor 

loadings 

less  than  . 30  are  generally 

considered 

insignificant,  and  were  omitted  from  the  table. 

The 

canonical 

factors  were  ordered  (see 

table  14)  on 

the 

canonical  correlation  coefficient  (Rc) ;  the  first  canonical 
factor  of  the  left  set,  and  the  first  canonical  factor  of  the 
right  set  had  the  highest  correlation.  The  redundancy  for  each 
factor  is  listed,  indicating  for  each  factor  of  the  left  set, 
the  proportion  of  variance  of  the  right  set  accounted  for. 

In  the  sample  case  shown  ir  table  14,  the  first  canonical 
factor  of  the  left  set  extracted  14.7  percent  of  the  variance  of 
that  set  and  explained  11.2  percent  of  the  variance  of  the  right 
set.  The  first  canonical  factor  of  the  right  set  accounted  for 
only  four  percent  of  the  variance  of  the  right  set.  It  can  be 
seen  that  although  the  first  canonical  factor  had  the  highest 
canonical  correlation,  it  accounted  for  only  a  small  portion  of 
the  total  variance.  The  contributions  of  the  remaining  factors 
were  evident. 

The  output  data  for  the  test  of  significance  are  shown  in 
table  15.  The  roots  are  related  to  factors;  removal  of  a  root 
is  equivalent  to  dropping  a  factor  from  the  canonical  correla¬ 
tion.  Table  15  reveals  that  nearly  all  of  the  factors  were 
needed  to  adequately  account  for  the  shared  variance  between  the 
left  and  right  set  data  in  this  particular  case. 


14 
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TABLE  14.  SAMPLE  CANONICAL  CORRELATION  OUTPUT 


LEFT  SET 

LEFT  SET 

RIGHT  SET 

VARIANCE 

VARIANCE 

FACTOR 

EXTRACTED 

REDUNDANCY 

EXTRACTED 

1 

.147 

.112 

.040 

2 

.064 

.045 

.051 

3 

.049 

.  031 

.082 

4 

.064 

.038 

.052 

5 

.033 

.018 

.030 

6 

.033 

.017 

.044 

7 

.024 

.011 

.067 

8 

.019 

.  007 

.020 

9 

.063 

.  018 

.074 

10 

.022 

.  006 

.047 

11 

.014 

.  003 

.018 

12 

.024 

.005 

.020 

13 

.010 

.002 

.045 

14 

.038 

.007 

.017 

15 

.034 

.005 

.052 

16 

.038 

.004 

.022 

17 

.015 

.001 

.047 

18 

.054 

.003 

.038 

19 

.047 

.002 

.039 

20 

.026 

.001 

.032 

21 

.078 

.002 

.062 

22 

.020 

.  000 

.018 

23 

.019 

.000 

.061 

24 

.062 

.  000 

.024 

Total : 

.997 

.  338 

1.000 

A  more  detailed  examination  of  the  example  case  factor 
structures  is  shown  in  table  16,  which  presents  the  correlation 
of  each  measure  with  each  factor.  Only  three  factors  are  shown. 
The  measures  which  contributed  least  to  a  specific  predictive 
factors  where,  therefore,  identified. 

As  an  initial  test,  only  the  factor  associated  with  maximum 
correlation  was  considered.  Measures  which  correlated  least 
were  successively  removed  from  the  measure  set  until  all 
remaining  measures  met  a  priori  criteria  (exceeding  correlation 
of  .25).  However,  it  was  apparent  from  the  resulting  data  that 
a  number  of  factors  ront ributc  to  the  canonical  correlation. 
Therefore,  the  simple  criterion  based  on  the  first  canonical 
factor  was  insufficient.  Alternative  criteria  are  presented  in 
the  following  discussion  sections. 
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TABLE  15.  EXAMPLE  CHI  SQUARE  TESTS  WITH  SUCCESSIVE 

ROOTS  REMOVED 


ROOTS 

REMOVED 

CANONICAL 

R 

R2 

0 

.  87 

.76 

1 

.  84 

.70 

2 

.80 

.64 

3 

.77 

.60 

4 

.75 

.56 

5 

.72 

.52 

6 

.68 

.47 

7 

.61 

.38 

8 

.  54 

.29 

9 

.51 

.26 

10 

.  .47 

.22 

11 

.45 

.20 

12 

.43 

.18 

13 

.42 

.18 

14 

.  38 

.15 

15 

.  33 

.15 

16 

.27 

.07 

17 

.25 

.06 

18 

.19 

.04 

19 

.16 

.03 

20 

.14 

.02 

21 

.10 

.01 

22 

.02 

.00 

23 

.01 

.00 

LAMBDA 


CHI2 

DF 

PRIME 

650 

576 

.0001 

550 

529 

.0004 

465 

484 

.0014 

393 

441 

.0038 

329 

400 

.0094 

272 

361 

.0212 

221 

324 

.0438 

176 

289 

.0818 

143 

256 

.1315 

119 

225 

.1854 

98 

156 

.2500 

80 

169 

.3217 

64 

144 

.4023 

50 

121 

.4922 

36 

100 

.5991 

25 

81 

.7018 

17 

64 

.7893 

11 

49 

.8520 

7 

36 

.9080 

4 

25 

.9431 

2 

16 

.9692 

— 

9 

.9893 

— 

4 

.9993 

- 

1 

.9998 
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SECTION  V 
DISCUSSION 

CANONICAL  CORRELATION  SELECTION 

The  criteria  used  for  the  current  test  were  based  on  the 
degree  of  correlation  between  each  measure  and  the  first 
canonical  factor;  when  all  measures  correlated  at  a  specified 
level,  no  further  reduction  of  the  measure  set  was  attempted 
However,  it  was  apparent  from  the  data  collected  that  a  number 
of  canonical  factors  significantly  contributed  to  canonical 
correlation  (or  prediction).  Thus,  the  criteria  for  measure 
selection  must  be  expanded. 

MULTIPLE  SIGNIFICANT  CANONICAL  FACTORS.  When  a  number  of 
canonical  factors  are  significant,  the  first  measure  to  be 
removed  from  the  measure  set  should  be  the  one  which  correlates 
the  least  with  the  group  of  significant  factors.  But,  the 
measure  which  correlates  least  with  factor  I  may  correlate  best 
with  factor  II  (as  shown  in  table  16).  The  correlation  across 
a  group  of  significant  factors  must  be  assessed  in  some  manner. 
The  following  steps  are  suggested  as  a  partial  solution  to  this 
problem: 

a.  Determine  the  significant  factors.  This  can  be  done 
using  the  statistical  test  presented  in  the  output 
along  with  a  rule  of  thumb  for  discarding  trivial 
factors.  Cooley  and  Lohnes  (1971,  pg.  176)  state. 

As  a  rule,  the  authors  frequently  treat  canonical 
correlations  of  .30  or  less  as  trivial." 

b.  Multiply  the  columns  of  the  factor  structure  by  the 
redundancy  of  the  respective  factor  to  weight  measure 
correlations  with  the  proportion  of  variance  accounted 
for  in  the  criterion  measures. 

c.  Using  the  weights  computed  in  (b)  above,  find  the 
greatest  weight  for  each  measure. 

d.  The  measure  which  is  a  candidate  for  removal  is  the 
measure  corresponding  to  the  least  of  the  numbers 
computed  in  (c)  above. 

PREDICTIVE  AND  CRITERION  SET  COMPOSITION.  It  should  be  noted  in 
the  preceding  that  the  right  side,  or  criterion,  measures  were 
not  considered  during  measure  selection.  In  the  current 
application,  however,  corresponding  right  and  left  measures 
were  the  same.  If  a  measure  is  to  be  removed,  one  should 
consider  whether  or  not  it  is  to  he  removed  from  just  one  side 
or  both.  There  are  several  possibilities  that  have  yet  to  be 
explored. 
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Prediction  of  the  Full  Original  Set  of  Measures.  If  we  assume 
that  "the  ^original  and  complete  set  of  measures  ;  ~  better  than 
any  subset  for  describing  performance,  and  our  goal  is  to  predict 
total  performance,  then  measures  should  be  removed  only  from  the 
left  set.  Removal  of  a  given  measure  from  both  sides  simultane¬ 
ously  may  take  away  a  measure  which  contributes  least  on  the 
left  side;  however,  it  is  quite  possible  that  the  removed 
measure  might  be  important  to  the  right  side,  or  criterion  side. 
In  application  it  might  be  feasible  to  have  an  expanded  criterion 
set  for  measure  development,  while  the  operational  measure  set 
might  be  reduced  for  practical  reasons. 

Prediction  of  the  Reduced  Set  of  Measures.  Practical  considera¬ 
tions  might  dictate  that  the  same  set  of measures  is  to  be  used 
for  predicting  as  well  as  for  measuring  that  which  is  to  be 
predicted.  The  algorithm  for  developing  the  reduced  set  must 
search  iteratively  for  that  measure  which  contributes  little  to 
both  sides  of  the  canonical  correlation  model  and  simultaneously 
remove  the  measure  from  both  sicies.  The  steps  similar  to  those 
suggested  (in  a-d)  above,  applied  for  each  measure  across  both 
right  and  left  side  factors,  represent  a  feasible  method. 

However,  it  must  be  noted  that  the  composition  of  the  predictor 
and  criterion  sets  might  be  somewhat  different;  thus,  the 
utilization  of  this  method  might  create  a  larger  predictor  (left 
side)  set  than  would  result  with  consideration  of  only  the  left 
set  alone. 

Prediction  of  Specific  Performance.  If  only  specific  performance 
characteristics  are  to  be  predicted,  then  the  factors  which 
relate  to  this  performance  must  be  located.  Specific  measures 
which  are  of  major  importance  to  the  desired  performance  may  be 
used  to  identify  the  pertinent  factors,  then  the  measures  which 
load  least  on  these  factors  may  be  discarded. 

Multiple  Predictive  and  Criterion  Sets.  The  discussion  of 
predictive  and  criterion  set  composition  is  concluded  (but  not 
exhausted)  by  noting  that  it  is  possible  that  multiple  sets 
might  be  required  in  order  to  predict  specific  terminal 
behaviors.  It  would  be  unwieldly,  and  perhaps  unwise,  to  expect 
the  development  of  just  one,  all-encompassing  predictive  and/or 
criterion  set.  Since  skill  shifts  during  training,  we  can 
anticipate  that  specific  set  composition  will  be  a  function  of 
the  time  and  place  during  training  that  the  prediction  is  to  be 
made,  as  well  as  the  specific  behavior  that  is  to  be  predicted. 

DISCRIMINANT  SELECTION 

The  discriminant  analysis  procedures  appeared  to  work  well 
to  strip-down  the  candidate  measures  to  a  very  small  subset 
which  could  discriminate  between  early  and  late  performance. 
Perhaps  too  much  so.  Initial  measure  rejection  on  the  basis  of 
measure  intercorrelations  appealed  to  be  quite  drastic.  In  a 
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few  analyses  the  wronq  measure  of  a  correlated  pair  was  dropped 
relative  to  outside  criteria,  such  as  ease  of  implementation. 

In  a  few  cases  there  was  apparent  conflict  between  the  criterion 
of  discrimination  and  the  criterion  of  adequate  performance 
description.  These  issues  are  discussed  briefly  in  the  follow¬ 
ing: 

CORRELATION  CRITERION.  The  candidate  measure  set  intercorrela¬ 
tions  were  higher  than  expected.  The  Day  1  vs.  Day  7  data  shown 
previously  (Table  11)  were  typical.  In  those  data,  by  dropping 
the  right-hand  member  of  a  pair  which  correlated  better  than 
r=.69,  a  substantial  reduction  in  measures  (from  24  to  13',  was 
seen  on  basis  of  Day  1  data  alone.  Further  measure  reduction 
(from  13  to  9)  occurred  when  the  remaining  Day  7  measures  were 
correlated.  The  final  set  reduction  (from  9  to  7)  occurred 
during  the  discriminant  analysis. 

It  was  possible,  as  a  result,  that  the  criterion  for 
selection  (the  ability  of  the  set  to  discriminate)  did  not 
influence  the  final  measure  set  as  much  as  the  investigators 
would  have  liked.  The  resulting,  "very  clean"  rotations  of  the 
factor  structures  suggested  the  possibility  of  performance 
dimension  oversimplification.  As  a  cor  sequence,  it  is  suggested 
that  further  work  with  the  DIS~RIM  SELECT  procedure  examine  a 
slightly  larger  correlation  tolerance  in  the  range  of  r=.74  to 
r=. 82 . 

MEASURE  PRIORITIES  FOR  SELECTION.  The  arbitrary  decision  rule 
was  established  that  the  right-hand  member  of  a  correlated  pair 
was  to  be  dropped.  At  first  it  was  thought  that  the  data  could 
be  arranged  generally  from  left  to  right  to  reflect  external 
criteria,  such  as  those  measures  which  are  easier,  faster  or 
less  expensive  to  implement.  However,  this  simple,  linear 
scheme  did  not  always  produce  the  desired  result. 

A  priority  of  measures  s chi' me  should  be  added  to  tne 
DISCRIM  SELECT  procedure.  It  should  cause  the  rejection  of  a 
lower  priority  of  any  pair  of  measures.  Also,  it  might  be 
necessary  fcr  reasons  other  thar  discrimination  to  retain  a 
particular  measure  at  all  costs  The  priority  scheme  might 
require  addressing  such  complexities  as  the  following:  If  A,  B, 
AND  C  are  dropped,  keep  D. 

DESCRIPTION  VS.  DISCRIMINATION.  The  reduction  of  measures  into  a 
set  which  significantly  discriminates  might  result  in  a  final  set 
which  has  weakened  power  to  describe  all  important  dimensions  of 
performance.  For  example,  holding  roll  attitude  might  be  a  very 
important  part  of  straight-and-level  flight  performance; 
however,  if  there  is  no  substantial  change  in  the  variance  due 
to  roll  attitude  holding  during  training,  the  measure  r.ight  fail 
to  emerge  from  a  discriminant  analysis.  This  problem  was 
attended  to  in  the  early  design  of  the  procedure. 
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T^e  DISCRIM  SELECT  procedur  was  organized  so  that  the 
ability  to  describe  performance  should  have  been  retained  bv  the 
final  measure  set.  Follcvring  ejection  of  highly-correlated 
measures,  a  principal  components  analysis  produced  a  list  of 
factors,  ordered  according  to  the  amount  of  variance  each 
contribute  .  On  the  basis  of  investigator  specified  criteria 
(the  minimum  oercent  variance  to  be  accounted  for  by  any  factor 
in  the  final,  reduced  set — in  this  case  it  was  7  percent),  the 
minimum  measure  set  size  was  defined.  After  initial  tolerance 
testing,  the  procedure  worked  well,  most  of  the  time. 

Statistically,  we  could  expect  the  results  to  go  awry  once 
in  awhile.  They  did.  In  at  least  one  case  it  was  judged  tha? 

e  discriminant  analysis  stopped  too  soon  because  it  hit  the 
minimum  measure  set  size;  a  low  communality  in  the  bottom 
measure  which  would  have  been  dropped  in  the  next  iteration, 
suggested  that  an  additional  iteration  would  have  produced  a 
significant  discrimination.  In  a  second  case  the  procedure  went 
too  far;  although  the  second- to-the  1 ast  iteration  produced  a 
significant  discrimination,  the  last  iteration  resulted  in  an 
insignificant  overall  discrimination. 

The  first  case  above  (stopping  too  soon)  has  to  be  accepted 
1  we  continue  to  insist  that  the  discriminating  set  should  have 
sufficient  description  power.  However,  further  testing  of  the 

Va“an^e  tolerance  appears  necessary.  Seven  percent  might 
ve  been  too  high;  preliminary  testing  suggested  that  five 
percent  was  toe  low.  Trials  in  the  range  of  5.5  percent  to  6 . 5 
percent  appear  warranted. 

The  second  case  above  (going  too  far)  can  be  corrected  by 
adding  the  capability  to  test  the  statistical  significance  of 
the  overall  discrimination  in  the  program.  A  subroutine  to 
compute  the  exact  probability  of  the  F-ratio  should  be  added. 

The  logic  should  be  changed  to  test  for  a  significant  F.  Once 
achieved,  the  program  should  continue  to  iterate  normally  unless 
F  becomes  insignificant.  If  that  should  happen,  the  previous 
iteration  should  bo  the  result.  Note  that  F  must  first  become 
significant  before  an  insignificant  F  can  cause  a  stop. 

STATISTICAL  ASSUMPTIONS.  Multivariate  techniques  were  explored  as 
a  part  of  a  method  to  reduce  the  number  of  measures  which  could  be 
used  to  describe  significant  aspects  of  performance  changes  during 
training  irather  than  the  more  traditional  application  in  person¬ 
nel  selection).  Although  limited  in  terms  of  the  number  of 
subjects,  the  study  involved  the  collection  and  processing  of  more 
than  2(l-miilion  numbers.  Because  of  practical  constraints,  it  was 
necessary  to  make  the  assumption  that  the  number  of  observations 
(participants  x  replications)  could  be  used  to  replace  the  number 
of  participants  found  in  more  cc nventional  use  of  the  multivariate 
technique.  While  the  use  of  observations  in  this  sense  remains  a 
researchable  issue,  it  is  emphasized  that  the  work  reported  herein 
is  being  continued  in  order  to  c  tablish  a  larger  data  base. 
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SECTION  VI 
CONCLUSIONS 


METHODS  ARE  AVAILABLE 

Engineering  hardware  and  behavioral  research  methods  are 
available  to  provide  pilot-system  performance  measurement  for 
many  operational  and  training  tasks.  The  major  constraints 
appear  to  be  related  primarily  to  the  amount  of  time  and  effort 
required  to  define  and  test  measurement.  In  order  to  minimize 
these  costs  of  obtaining  performance  information,  and  to  maximize 
the  utility  of  that  information,  method  improvement  should  be 
undertaken . 

METHOD  IMPROVEMENT 

The  initial  methodology  for  reducing  candidate  performance 
measures  which  were  developed  during  this  study  requires  further 
elaboration  and  refinement.  Reduction  of  measures  to  the  set 
which  yields  information  concerning  performance  prediction  will 
require  further  tests  of  criteria  for  rejecting  measures; 
rejection  on  the  basis  of  simple  correlations  appears  erroneous 
in  some  cases.  The  discriminant  procedure  also  requires 
refinement  in  the  area  of  elimination  of  correlated  measures;  a 
priority  elimination  scheme  appears  warranted  in  some  cases. 

Also,  exercise  of  the  selection  techniques  with  a  larger  data 
base  is  mandatory. 

The  measurement  development  method  requires  the  combined 
use  of  analytical  and  empirical  techniques;  however,  the 
dependence  on  empirical  data  collection  is  more  than  desirea. 
Empirical  methods  are  costly  and  time  consuming,  partly  because 
multivariate  statistical  procedi  res  require  such  large  samples 
for  maximum  effectiveness  Lane,  1971).  Often  in  practical 

settings  sufficient  time  is  jusr  not  available  for  the  full  use 
of  this  method.  It  is  hoped  that  means  can  be  found  to  permit 
heavier  emphasis  on  analysis. 

Over  time,  empir_ical  data  collection  for  measurement 
development  may  be  reduced  if  (1)  attempts  are  made  to  collect 
empirical  results  which  are  gcneralizable,  and  (2)  measurement¬ 
relevant  information  is  catalogued  for  used  by  others.  If  some 
attempt  is  made  to  preserve  measurement  development  information, 
conceivably  future  data  collection  efforts  may  be  reduced. 

The  work  reported  here  is  b  ised  on  simulation  research. 
Similar  work  involving  inflight  performance  measurement  will 
require  expensive  inflight  and  ground  measurement  equipment 
installations.  As  considerabl o  xpense  is  involved,  justifica¬ 
tion  of  the  expense  is  required  n  terms  of  the  benefits 
accruing  from  the  availability  o’  performance  information; 
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however,  such  a  tradeoff  analysis  for  justification  also  requires 
a  measurement  system  for  the  generation  of  data.  Perhaps  small 
scale  test  systems  should  be  developed  for  the  purpose  of 
exploring  potential  payoff. 

The  current  methods  using  ground-based  computer  equipment 
can  be  improved  by  (1)  new  measures  suggested  by  the  empirical 
tests,  (2)  better  computer  algorithms  for  definition  of 
measures,  (3)  implementation  so  that  measurement  can  be  computed 
and  used  as  a  simulated  flight  is  performed,  and  (4)  selection 
techniques  which  include  diagnostic  as  well  as  discriminating 
and  predicting  measurement  properties.  Further,  if  test  and 
evaluation  efforts  can  be  initiated  which  focus  on  measurement 
and  operational  information  needs,  measurement  development 
efforts  should  benefit  from  the  feedback  provided. 
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SECTION  VII 
RECOMMENDATIONS 


It  is  recommended  that: 

a.  The  discriminant  selection  method  improvement  be 
undertaken  by  further  work  with  selection  criteria 
along  with  the  addition  of  a  priority  scheme  to 
control  measure  rejection  during  initial  correlation 
analysis . 

b.  Canonical  correlation  prediction  method  improvement 
be  undertaken  by  implementing  new  algorithms  which 
will  consider  measures  which  load  onto  more  than  one 
factor  and  will  consider  measures  on  bcth  sides  of 
the  prediction  equation. 

c.  More  data  be  collected  with  the  same  experimental 
design,  data  collection  and  measure  producing 
software  to  permit  the  acquisition  of  more 
observations  and  participants  for  the  above  method 
improvement. 

d.  After  the  desired  measure  sets  and  the  conditions 
which  control  measurement  are  defined  from  the  above 
work,  a  real-time  prog,  amming  effort  be  taken 

to  modify  the  Instrument  Flight  Maneuvers  program 
accordingly. 

e.  Following  Instrument  Flight  Maneuvers  program 
modification,  conduct  an  evaluation  of  the 
measurement  subsystem  during  automated  training. 

f.  The  performance  measurement  methods  reported  and 
referenced  herein  be  considered  for  application  to 
simulation  and  instruct  tonal  aircraft  environments. 
As  a  supporting  comment,  sufficient  work  has  been 
done  to-date  to  justify  the  conclusion  that 
statistical  and  rational  methods  can  be  applied  to 
the  sensible  specification  of  performance  measures 
in  manned-vehicle  training.  Training  commands 
appear  to  have  specific  needs  for  improved  measure¬ 
ment.  Since  measurement  of  the  kind  addressed 
herein  may  take  some  investment  and  lead-time,  it 
is  suggested  that  fine  tuning  the  selection  methods 
need  not  hold  back  the  process  of  obtaining  measure¬ 
ment  capability.  Ultimately,  measurement  studies, 
or  at  least  verification  of  measurement,  must  be 
conducted  in  operatior  1  training  settings  to 
insure  the  best  utilization. 
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